arXiv:1507.07055v3 [math.ST] 3 Mar 2017 


Submitted to Bernoulli 


Inference in Ising Models 

BHASWAR B. BHATTACHARYA and SUMIT MUKHERJEE 

Department of Statistics 
University of Pennsylvania 
Philadelphia, USA 

E-mail: bhaswarOwharton. upenn. edu 

Department of Statistics 
Columbia University 
New York, USA 
E-mail: sm3949@columbia.edu 

The Ising spin glass is a one-parameter exponential family model for binary data with quadratic 
sufficient statistic. In this paper, we show that given a single realization from this model, the 
maximum pseudolikelihood estimate (MPLE) of the natural parameter is ^ajv-consistent at a 
point whenever the log-partition function has order ajv in a neighborhood of that point. This 
gives consistency rates of the MPLE for ferromagnetic Ising models on general weighted graphs 
in all regimes, extending the results of Chatterjee {Ann. Statist. 35 (2007) 1931-1946) where 
only \/]V-consistency of the MPLE was shown. It is also shown that consistent testing, and 
hence estimation, is impossible in the high temperature phase in ferromagnetic Ising models on 
a converging sequence of simple graphs, which include the Curie-Weiss model. In this regime, 
the sufficient statistic is distributed as a weighted sum of independent Xi random variables, and 
the asymptotic power of the most powerful test is determined. We also illustrate applications of 
our results on synthetic and real-world network data. 
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1. Introduction 

The Ising spin glass is a discrete random field developed in statistical physics as a model 
for ferromagnetism [23], and is now widely used in statistics as a model for binary data 
with applications in spatial modeling, image processing, and neural networks (cf. [2, 20, 
22] and the references therein). To describe the model, suppose that the data is a vector 
of dependent ±1 random variables a = (cti, a 2 , ■ ■ ■, cat), and the dependence among the 
coordinates of cr is modeled by a one-parameter exponential family where the sufficient 
statistic is a quadratic form: 

=r'JNr = ^ JN{iJ)rirj ( 1 . 1 ) 

for any t S := {—1,1}^ and a.n N x N symmetric matrix with zeros on the 
diagonals. The elements of Jn are denoted by JN{i,j) = JnUa), for 1 < z < j < iV. 
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Given any /3 > 0, the quadratic form (1.1) defines a parametric family of probability 
distributions on S'at: 



( 1 . 2 ) 


where F]^(j3) is the log-partition function which is determined by the condition 
t} = 1, that is, 



Fn{I3) ■■= log 


(1.3) 


where Eq denotes the expectation over a distributed as Pq, the uniform measure on S^. 
The parameter /3 = 1/T is often referred to as the inverse temperature, so the high 
temperature regime corresponds to small values of ft. The family (1.2) includes many 
famous statistical physics models: the usual ferromagnetic Ising model on generals graphs, 
the Sherrington-Kirkpatrick mean-field model [30, 33, 34], and the Hopfield model for 
neural networks [22]. 

Estimating the parameter /3 in (1.2), given one realization from the model, is extremely 
difficult using likelihood-based methods because of the presence of an intractable nor¬ 
malizing constant Fn{P) in the likelihood. A variety of numerical methods are known 
for approximately computing the likelihood [18], but they are computationally expensive 
and very little is known about the rate of convergence. 

One alternative to using likelihood-based methods is to consider the maximum pseu¬ 
dolikelihood estimator (MPLE) [4, 5]. Chatterjee [10] showed that given a single spin 
configuration from the model (1.2), the MPLE /3jv is -s/iV-consistent at /3 = (dof when¬ 
ever lim infA/-J.OO ^Fn{I3o) > 0. However, in many popular models such as regular graphs, 
random graphs, and dense graphs, the log-partition function EAr(/3) = o{N) for certain 
ranges of /3, and Chatterjee’s result does not tell us anything about the consistency of 
the MPLE. 

In this paper, we show that the MPLE is y^ojv-consistent at /3 = /3o, if the log-partition 
function has order a at in a neighborhood of Pq (Theorem 2.1), for a sequence oat —>■ oo. 
This gives the consistency rate of the MPLE for all values of /3 > 0 away from the 
critical points, and shows that the rate of the MPLE undergoes phase transitions for 
Ising models on various graphs ensembles (Corollaries 3.1 and 3.2). We also show that 
no consistent test, and hence no estimator, exists if the log-partition function remains 
bounded (Theorem 2.3). As a consequence, consistent estimation is impossible in the 
high temperature regime in ferromagnetic Ising models on a converging sequence (in 
cut-metric as defined by Lovasz and co-authors [7, 8, 27]) of graphs (Theorem 3.3). This 
strengthens previous results of Comets and Gidas [12] and Chatterjee [10] where the 
MLE and the MPLE was, respectively, shown to be inconsistent for 0 < /3 < 1 in the 
Curie-Weiss model, which corresponds to taking JAr(*, j) = 1/A^, for all 1 < i < j < iV. 

sequence of estimators {/3A?}Af>i is said to be aff-consistent at /3 = /3o if uatI/Sa? ~ /^ol = Op{l), 
that is, limsup^_,.,^ lim supjv_»oo iP/So (“iv IdiV - M > K) = 0. 
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Finally, using the emerging theory of graph limits [7, 8, 27], the limiting distribution of 
the sufficient statistic and the asymptotic power of the most powerful test are 

derived for dense graphs in the high temperature regime (Theorem 3.4). 

While proving the consistency of the MPLE, we show that the asymptotic order of the 
sufficient statistic is same as the order of the log-partition function for general 

matrices Jat; a result which appears to be new and might be of independent interest. 
More precisely, the sequence of random variables -^Hiy{a) is asymptotically tight under 
P/ 3 o, and the limiting distribution (if any) is non-zero when the log-partition function has 
order oat in a neighborhood of /3o (Lemma 5.1). Moreover, simple bounds for matrices 
Jat with non-negative entries provide the correct order of the log-partition function in 
the high temperature regime for a wide class of Ising models (Lemma 7.1). 

Finally, we illustrate the usefulness of the MPLE and the applicability of our results on 
a real dataset: In Section 4, we study the effect of gender among friends in two Facebook 
friendship-networks from the Stanford Large Network Dataset (SNAP) collection. 

Another active area of research is high-dimensional structure estimation in a sparse 
Ising model, where the goal is to consistently estimate the underlying matrix J^/^, under 
certain structural constraints from i.i.d. samples from the model (see [1, 9, 32, 35] and 
the references therein). This is in contrast with the present work, where the matrix J^r is 
known and we estimate the natural parameter and its error rate given a single realization 
from the model. 

1.1. Organization 

The rest of the paper is organized as follows: The consistency of the MPLE and general 
inconsistency results are described in Section 2. Applications of these results to various 
graph ensembles including regular graphs, random graphs, and general weighted graphs, 
are explained in Section 3. Theorems 2.1 and 2.3 are proved in Sections 5 and 6, re¬ 
spectively. The proofs of Corollaries 3.1 and 3.2 are given in Section 7. The results on 
converging sequence of graphs are in Section 8. The analysis of the Eacebook dataset is 
given in Section 4. 


2. Consistency of the MPLE 

The maximum pseudolikelihood estimator (MPLE), introduced by Besag [4, 5], can be 
conveniently used to approximate the joint distribution of cr ~ that avoids calculations 
with the normalizing constant. 

Definition 2.1. Given a random vector {Xi,X2, ■ ■ ■, ATat) whose joint distribution is 
parametrized by a parameter /3 gM., the MPLE of j3 is defined as 

N 

Pn := argmax J|/,(/3,X), 

2=1 


( 2 . 1 ) 
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where fi{f3,X) is the conditional probability density of Xi given {Xj)j^i. 


Given ct ^ from the model (1.2), the conditional density of ai, given can 

be easily computed. To this end, given r S Sn, define the function Lr : [0,oo) —>■ K as 


Lr{x) 


1 

N 


N 

'^mi{T){Ti - tanh(a:TOi(T))), 
i=l 


( 2 . 2 ) 


where 

N 

^ JN{i,j)Tj. (2.3) 

i=i 

Note that miir) does not depend on since the diagonal element J]^{i,i) = 0. In¬ 
terpreting tanh(±oo) = ±1, the function Lt can be extended to [0,c»] by defining 
Lt-(oo) := ~ Then it is easy to verify that (see Chatterjee 

[10], Section 1.2) ^ogfi{l3,T) = Lr{P), and the function Lr{l3) is a decreasing 

function of /3. Therefore, the MPLE for (3 in the model (1.2) is 

/37v(ct) := inf{a; > 0 : Lcr(a;) = O}, (2.4) 

where ct ^ P/j is a random element from (1.2). Hereafter, we suppress the dependence on 
cr and denote by ■= PN^cr) the MPLE of /3. 

Consistency results for the MPLE in Ising models are known in the case of lattices 
[11, 19, 21, 31], complete graphs [10], and spatial point processes [24]. However, for general 
processes where the dependence is neither local nor mean-field, it is very difficult to 
prove consistency results for MPLE. In a major breakthrough, Chatterjee [10] developed 
a remarkable technique using exchangeable pairs and showed [10], Theorem 1.1, that 
the MPLE {Pn}n> 1 : given a single realization a S from (1.2), is a -s/iV-consistent 
estimator at /3 = /3o > 0, whenever sup^v W'^nW < oo^ and 

lim inf > 0. (2.5) 

N^OO I\ 

To the best of our knowledge, all results regarding MPLE {Pn}n>i are in the regime 
where it is i/fV-consistent. However, in many examples such as the Ising model on dense 
graphs, d(A^)-regular graphs with d{N) —)• oo, and Erdos-Renyi graphs G{N,p{N)), with 
<C p{N) 1, the log-partition function F/v(/3) = o{N) for certain ranges for /3. In 
these cases, the hypothesis (2.5) is not satisfied, and Chatterjee’s result is not applicable 
for deriving the consistency of the MPLE. The following theorem (see Section 5.2 for 
proof) shows that the consistency of the MPLE at a point is governed by the order of 
the log-partition function in a neighborhood of that point. This generalizes the result 
of Chatterjee [10] giving the rate of consistency of the MPLE for all values /3 (at all 
temperatures) away from the critical points. 

^For any N X N symmetric matrix A, denote by ||yl|| = sup^gjjjv the operator norm of A. 
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Theorem 2.1. Let sup;v>i \\Jn\\ < oo, and j3o > 0 be fixed. Suppose {aN}N>i is a 
sequence of positive reals diverging to oo such that for some S > 0 we have 

0 < liminf —— 6) < limsup — F]\[{fio + 6) < oo. (2-6) 

Af-foo Off N^od 0-N 

Moreover, assume that the following conditions hold: 

(a) limsup^_^^limsup^_^^ l"^^(CT)| • l{\m^{a)\ > K}) = 0, where 

is as defined in (2.3). 

(h) limsupjv^^ ^ JN{i,jf <oo. 

Then the MPLE {/?jv}Ar>i for the model (1.2) is a yja n - consistent sequence of estimators 
for P = /Jo- 

Conditions (a) and (b) are technical requirements arising out of the proof technique, 
which ensure that the main contributions come from mi{a) that are small, and on average 
the entries in Jjv are not too large compared to ojv- The proof of the result is given in 
Section 5.1 (with technical lemmas proved in Appendix A). The proof is organized as 
follows: Using the two conditions of the theorem, Lemma 5.2 shows that E^(,(Lcr(/3o)^) = 
0{aM which implies that Lcr{(io) is small with high probability. To derive the rate of 
consistency of the pseudo-likelihood, it thus suffices to get a lower bound of the derivative 
L(,(/3). Again invoking the two conditions of Theorem 2.1, in Lemma A.3 we derive a 
lower bound on 

N 

'^mi{afl{\m,{a)\ < K} 

i=l 

for K fixed. This translates into the desired lower bound on the derivative L'^{j3) using 
which the proof of the theorem is then completed. 

The conditions of the theorem are satisfied in most commonly used models (see Sec¬ 
tion 3). Moreover, the result of Chatterjee [10], Theorem 1.1, is an immediate corollary 
of Theorem 2.1 (refer to Section 5.3 for the proof). 

Corollary 2.2 (([10], Theorem 1.1)). Let sup 7 v>i ]] Jat]] < oo and /3o > 0 6e such that 
(2.5) holds. Then the sequence of estimators {/3Ar}jv>i is '/N consistent for (3 = Pq. 

Remark 2.1. Condition (2.6) in the Theorem 2.1 demands the right order of the log- 
partition function in a small neighborhood around the point Pq. This avoids the critical 
points, where the order of the log-partition function (and its derivative) undergoes a sharp 
transition. It follows from the proof of Theorem 2.1 that the following (possibly slightly 
weaker) condition works as well instead of (2.6).' 

0 < lim liminf — F)f{Po — i5) < limsup — F)f{Po) < oo. 

|5—>0 N^oo ai\{ N^oo CLN 

However, for most of the applications estimates of the log-partition function are more 
readily available. Thus, the sufficient conditions are stated in terms of the log-partition 
function instead of its derivative. 
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Note that Theorem 2.1 does not apply to the case Fn{Po) = 0(1). Next, we show that 
if F]s[{Pq) = 0(1), then there is no sequence of estimators which consistently estimates 
In fact, we show that even testing is impossible in this regime: Given a single spin- 
configuration tr G Sn from (1.2), there exists no sequence of consistent tests^ for the 
hypothesis testing problem: 


Flo : P = Pi versus Hi ■. P = P 2 . (2.7) 

This is summarized in the following theorem (see Section 6 for proof): 

Theorem 2.3. Let sup^>]^ ll'^Afll < oo, and Pq > 0 be fixed. Suppose 

limsupF/v(/3o) < oo. (2.8) 

N —>00 

Then for 0 < Pi < P 2 < Po, there exists no consistent sequence of tests for the testing 
problem (2.7). In particular, there exists no eonsistent sequence of estimators for P in 
the interval [0, Pq\. 

One of the main applications of above results is in deriving the rate of the MPLE for 
Ising models on weighted graphs, that is, for matrices J^v with non-negative entries. For 
such matrices, condition (b) in Theorem 2.1 can be directly verified, and we have the 
following simplified corollary: 

Corollary 2.4. Consider the model (1.2) such that is a sequence of matrices with 
non-negative entries with limjv->.oo \\Jn\\ = A > 0. 

(a) The sequence of estimators {/3Ar}jv>i is || JtvHf := JN{i,j)^ consistent at 

P = Po for any Po < j, whenever condition (a) in Theorem 2.1 holds. 

(b) //limsup^_,,go JN{i,j)^ < 00 , then exists no consistent sequence of estimators 

for P in the interval [0, ^). 


3. Applications 

The V^-consistency of the MPLE in the Sherrington-Kirkpatrick (SK) model and the 
Hopfield model, for all values of /3 > 0, follows from results of Chatterjee [10]. Our results 
give the rate of consistency of the MPLE in the regime where it is not -x/iV-consistent. 

We begin with a simple example where the rate of the MPLE undergoes multiple 
phase transitions. 

sequence of test functions i/ijv : Sm —t {0,1} is said to be consistent for the testing problem (2.7) 
if limjv->oo IE,3 j0jv = 0 and limjv-s.oo = 1- 
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Example 1. Consider the model (1.2) with 

[O, otherwise. 

Then the sequence of estimators {Pn}n>i is inconsistent for P € (0,1), consistent 

for P G (1, 2), and '/N-consistent if P > 2. 

The proof of the above example is given in Section 7.2. In fact, this example can 
be easily generalized to construct a if-block matrix Jjv such that the consistency rate 
of MPLE undergoes K phase transitions. However, for most popular choices of Jn the 
rate of the MPLE undergoes at most one phase transition from 11 JatH F-consistent to 
•\/iV-consistent. 


3.1. Ising model on regular graphs 

Let Gn be a sequence of c?Ar regular graphs. Consider the family of probability distribu¬ 
tions (1.2) with the sufficient statistic 

Hn{t) = ^t'A{Gn)t, (3.1) 

dN 

where A(Gn) = ((ajv(*,i))) is the adjacency matrix of the graph Gn- This includes 
Ising models on lattices, complete graph, hypercube, and random regular graphs, among 
others, and have been extensively studied in probability and statistical physics. Dembo 
et al. [15, 14] derived the limit of the log-partition function for random regular (and 
other locally-tree like) graphs. Levin et al. [26] showed that the mixing time of the 
Glauber dynamics on the complete graph exhibits the cutoff phenomenon [16] in the high 
temperature regime. The cutoff phenomenon for lattices was established by Lubetzky and 
Sly in a series of breakthrough papers (refer to [29, 28] and the references therein). 

The next result gives the rate of consistency of the MPLE for general regular graphs. 
The proofs are deferred to Section 7. 

Corollary 3.1. Fix Po > 0 and let Gn be cl sequence of regular graphs. Suppose 
{Pn}n>i is the MPLE for the model (1.2) with sufficient statistic (3.1). 

(a) If 0 < Po < 1, {Pn}n>i is a \/N/ djv - consistent sequence of estimators for Pq. 

(b) If Po > 1; {Pn}n>i is a y/N-consistent sequence of estimators for Pq. 

The above theorem shows that the rate of the MPLE undergoes a phase transition at 
P = 1 for general regular graphs. In particular if dj^ = d = 0(1) remains bounded, then 
the above theorem shows that the MPLE is y/N for all non-negative p 1. However, 
in this case, it is easy to argue that liminf 7 v->.oo jf^NiP) > 0, for all /3 > 0 (see proof 
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of lower bound in Corollary 3.1). Theorem 2.1 then concludes that Pn is -s/iV-consistent 
for all values of /3 > 0. In fact, using similar arguments as in the proof of Corollary 3.1, 
it follows that the MPLE {/3Ar}w>i is -s/lV-consistent for all /3 > 0 in all bounded degree 
graphs with at least 0{N) edges. This shows that MPLE is V^-consistent for lattice 
graphs re-deriving classical results (see [21] and the references therein). 

For d]s[ —>■ oo, the behavior of the MPLE at /3 = 1 remains unclear. It is believed that 
the MPLE might have a non-Gaussian limiting distribution at the critical point /3 = I 
[ 10 ]. 

Remark 3.1. If dN = Q{N)p then Theorem 3.1 shows that the MPLE is 0(1) con¬ 
sistent for 0 < Po < 1; suggesting that the MPLE might be inconsistent in this regime. 
Chatterjee [10] showed that this is indeed the case for the Curie-Weiss model (where 
= 1/./V for all i ^ j) for 0 < /? < 1. Comets and Gidas [12] showed that even the 
MLE of p in the Curie-Weiss model is inconsistent for 0 < P <1. Later, in Theorem 3.4 
we strengthen this result by showing that for Ising models on arbitrary dense graphs, there 
exists no sequence of consistent estimators before the phase transition point. This extends 
the results in [10, 12] and justifies the 0{l)-rate of the MPLE in the dense case. 


3.2. Ising model on Erdos—Renyi graphs 


Let Gn ~ Q{N,p{N)) be a sequence of Erdds-Renyi graphs. Consider the family of 
probability distributions (1.2) with the sufficient statistic 


Hn{t) = 


-t'A{Gn)t, 


Np{N) ‘ 

where A{Gpf) = {{a]s[{i, j))) is the adjacency matrix of the graph Gjv- 


(3.2) 


Corollary 3.2. Fix Po > 0 and consider a sequence Gn ~ G{N,p{N)) of Erdos-Renyi 
graphs, with p{N) < 1. Let {Pn}n>i be the MPLE for the model (1.2) with 

sufficient statistic (3.2). 

(a) If D < Pq < 1, {Pn}n>i Is a ^JlJp{N)-consistent sequence of estimators for Pq. 

(b) If Po > 1) {Pn}n>i is a '/N-consistent sequence of estimators for Pq. 


As in the regular case, the rate of the MPLE undergoes a phase transition at /3 = 1 
for Erdds-Renyi graphs. Figure 1 shows the error bars for the MPLE for the Ising model 
on Gat ~ Q{N,p{N)), with N = 2000 and p{N) = N~i, for a sequence of values of 
/3e [0,2]. 

^Given non-negative sequences {aiv}iv>l S'lid {b]v}iv>i, the notation = 0(6jv) means that there 
exist constants ki,k 2 > 0, such that ^ ^ for ^ large enough. 
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MPLE in Random Graph 



Figure 1: The MPLE and the 1-standard deviation error bar in an Ising model on Gjv ~ 0{N,p{N)) 
with N = 2000 and p{N) = TV” 3 averaged over 100 repetitions for a sequence of values of ^ G [0, 2]. 
Lengths of the error bars undergo a phase transition at /3 = 1, as predicted by Corollary 3.2 which shows 
that for 0 < /3 < 1 the MPLE is Nq consistent, and for /3 > 1, the MPLE is \/TV-consistent. 

3.3. Ising model on dense graphs 

Recall that the MPLE is inconsistent in the Curie-Weiss model in the high temperature 
regime, 0 < /? < 1 [10]. In this section, using the emerging theory of graph limits and 
Theorem 2.3 above, we strengthen this result to show that consistent testing is impossible 
in the entire high temperature regime in Ising models on a converging sequence of dense 
graphs. We also calculate the distribution of the most powerful test and the asymptotic 
power in this regime. 

3.3.1. Graph limit theory 

Let Gjv be a simple graph with vertices V{Gn) = {1, 2,..., N} and adjacency matrix 
A{Gn)- Lovasz and co-authors [7, 8] developed a limit theory of graphs, which connects 
various topics such as graph homomorphisms, Szemeredi regularity lemma, and extremal 
graph theory. In the following, we summarize the basic results for converging sequence 
of graphs (cf. Lovasz [27] for a detailed exposition). To this end, note that any graph 
Gff can be represented as a function ITg„ : [0,1]^ —)■ [0,1] in a natural way: Define 
WGr^{x,y) := 1 if and only if {\nx\, [nj/]) is an edge in Gjv, that is, partition [0,1]^ into 
N"^ squares of side length l/V, and define WcMi^iV) = 1j when {x,y) is in the (a, 6)th 
square and (a, h) is an edge in Gat. Let W be the space of all measurable functions from 
[0,1]^ into [0,1] that satisfy W{x,y) = W{y,x) for all x,y G [0,1]. For every W G W 
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and any fixed simple graph H = {V{H), E{H)) define the homomorphism density 

t{H,W)= TT W{xi,xj)(lxidx 2 -■ ■ dx\v(H)\- 

A sequence of simple graphs {GAr}jv>i is said to converge to W G W ii for every finite 
simple graph H, 

lim t{H,GN) =t{H,W). (3.3) 

Af— foo 

The limit objects, that is, the elements of W, are called graph limits or graphons. Con¬ 
versely, every such function arises as the limit of an appropriate graph sequence. 

It turns out that the above notion of convergence can be suitably metrized using the 
so-called cut-metric (cf. [27], Chapter 8, for details). Moreover, every function W 
defines an operator Tw ■ ^ 2 ( 0 ,1] —?> ^ 2 ( 0 ,1]: 

(Twf){x)= [ W{x,y)f{y)dy. (3.4) 

Jo 

Tw is a Hilbert-Schmidt operator with operator norm denoted by || lT|j, which is compact 
and has a discrete spectrum, that is, a countable multiset of non-zero real eigenvalues 
{Ai(lT)}igN- In particular, every non-zero eigenvalue has finite multiplicity and 

00 „ 

Y^XUW)= W{x,yfdxdy:=\\W\\l (3.5) 

3.3.2. Consistency and asymptotic power 

Recall that for a graph Gjv, A{Gn) is the adjacency matrix of Gn- Now, using graph 
limit theory we show the following result: 

Theorem 3.3. Let {GAr}Ar>i be a sequence of simple graphs which converges in cut- 
metric to W & W such that bC(a^j y) da; dy > 0. Consider the testing problem 

(2.7) given a single realization a G Sn from (1.2) with sufficient statistic Hn{t) = 
At'A{Gn)t. 

(a) If 0 < < (32 < uitqi’ there does not exist a sequence of consistent tests for 

(2.7). 

(b) If (3o > ]|i^> then the MPLE {,dAr}Ar>i is a sequence of y/N-consistent estimators 
for (3 = Po- 

The proof of the theorem is given in Section 8. It involves showing that Fn{Pq) = 
0(1) whenever 0 < /Iq < for nny converging sequence of graphs, which together 

with Theorem 2.3 proves (a). To show (b) it suffices to show that limjv_,.oo ^Fn{Po) > 
0, for Pq > (by Corollary 2.2). For Ising models on a convergence sequence of 

graphs, limAr_,.oo jfEN{Po) is given by a variational problem (8.1) (cf. [8], Theorem 2.14). 
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Even though explicitly solving this variational problem for large values of ^ is extremely 
difficult, a simple argument can be used to show that the value of the variational problem 
is positive for j3 > |p|qy- 

By the Neyman-Pearson lemma, the most-powerful (MP) test for (2.7) is based on 
the sufficient statistic By Theorem 3.3, the test based on Hjq[a) is not consistent 

(see Figure 2). However, the asymptotic power of the MP-test can be derived from the 
limiting distribution of iJjv(o'), for any /3 < p| 7 |y- 


Ising model on random graph Q{N,p) 

o 

o , 



0 0.11 0.22 0.33 0,44 0.56 0.67 0.78 0.89 1 

Edge probability (p) 


Figure 2: The power of the MP-test for the Ising model on an Erdos-Renyi random graph Q{N^p) 
as a function of p and 0, with N = 500. Every point (p, 0) in the grid shows the empirical power of 
the MP-test averaged over 100 repetitions. Note the phase transition curve 0{p) = ^ above which the 
MP-test has power 1, as predicted by Theorem 3.4. 


Theorem 3.4. Let {Gn}n>i be a sequence of simple graphs which converges in cut- 
metric to W G W, with /jp ^2 lV(a:, y) dx dy > 0. // cr ~ P^, then for j3 < ||jl|y 

1 1 \ 

HN{a) = ^ ^ A,(m) - 1 j, (3-6) 

where > are i.i.d. Xi random variables. 

Hereafter, the random variable in the RHS of (3.6) will be denoted by Qp^w can 
be used to compute the asymptotic power for the test based on HN{a) for the testing 
problem (2.7), when 0 < /3i < /?2 < pby■ To this end, we need the following definition: 










12 


Bhattacharya and Mukherjee 


Definition 3.1. Let W (iW and /3 < p|7|y- Denote by the distribution function 

of the random variable Q/s^w defined in (3.6). Also, let qi-a,p,w be the (\ — a)th quantile 
of Fp^w, that is, f'p{Qp,w > qi-a,p,w) = ot. 

The following corollary is an immediate consequence of the Neyman-Pearson lemma 
and Theorem 3.4. 

Corollary 3.5. Fix a G (0,1) and 0 < /3i < /32 < The most powerful level a test 

for (2.7) rejects FIq when Ftf^^a) > qi-a,Pi,W! ond has limiting power 

lim Vp^(F{i\[{a) > qi-a,Pi,w) = ^ — Qp2,w{qi-a,i3i,w)- (3.7) 

N—>oo 

In most of the relevant examples, the limiting graphon W has finitely many non-zero 
eigenvalues, and the expression on the RHS of (3.7) can be computed easily in terms of 
the quantiles of the chi-squared distribution. 

Example 2. Suppose Gn ~ Q{N,p) be a Erdds-Renyi random graph with Q < p < 1. 
Then Gn converges to the constant function Wp := p on [0, 1]^, which has only one non¬ 
zero eigenvalue Ai(lTp) = p. Therefore, consistent testing is impossible for 0 < /3 < A 
(see Figure 2). Moreover, for (3 < 1/p, (3.7) simplifies to 

If Qi-a denotes the (1 — a)th quantile of the Xi distribution, then by (3.7), the limiting 
power of the test with rejection region {F[]\f{a) > Ca '■= piqi-a — 1)} for the testing 
problem /? = 0 versus P = Po < 1/p is 

lim ¥pg{HN{cr) > Ca) = P(x? > (1 - Pop)qi-a)- (3.8) 

N —^OO 

The limiting power of the MP-test for the Curie-Weiss model (which corresponds to 
taking p = 1 in (3.8) ) is shown in Figure 3. Note that it has a phase transition at P = 1, 
as stated in Theorem 3.3. 


Remark 3.2. Note that throughout the paper, the term phase transition has been is 
used to imply a change in the rate of consistency of the pseudo-likelihood estimate Pn. 
Interestingly, in all our examples (Corollaries 3.1, 3.2 and Theorem 3.3) the change in 
the rate of consistency happens exactly at the point of thermodynamic phase transition, 
that is, prior to this phase transition point the log-partition function is o{N), whereas 
after the phase-transition point the log-partition function scales linearly with N. In fact, 
in the setting of Corollary 3.1, the limiting log-partition function is continuous but not 
differentiable at the phase transition point P = 1 (see [3], Theorem 2.2(b)). Similar 
statements about the non-differentiability of the limiting log-partition function should also 
hold for the other two examples, but since they are not directly used in our calculations, 
this direction has not been pursued. 
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Ising Model on the Complete Graph 



Figure 3: The power of the MP-test in the Curie Weiss Model as a function of /3; the black curve is 
the empirical power for the Curie-Weiss model with N = 500 and 1000 repetitions at each point along a 
sequence of values (of length 500) of /d E [0, 2]. The red curve is the limiting power function (corresponds 
to taking p = 1 in (3.8)) as a function of /3 E [0, 2]. The blue line corresponds to the level a = 0.05 of 
the test. 

4. Analysis of the facebook dataset 

Ising models have been widely used to understand correlations among neighboring ver¬ 
tices in network data with binary node attributes. Here, we use it to study the effect 
of gender in Facebook friendship-networks using data from the Stanford Large Net¬ 
work Dataset (SNAP) collection, available freely at http://snap.stanford.edu/data/ 
egonets-Facebook.html. The nodes are groups of users from Facebook and there is an 
edge between two users if they are friends. The dataset also include several anonymized 
node features, such as hometown, gender, birthday, school, and university. We consider 
two networks (referred to as FBI and FB2) with gender as the binary node feature, 
encoding, without loss of generality, male by 1 and female by —1. The nodes labelled 
1 are colored blue and those labelled —1 are colored red. The FBI network has 221 
nodes and 3176 edges. Among the 221 nodes, 170 are labelled 1 and 51 are labelled — 1. 
The FB2 network has 333 nodes and 2519 edges, with 213 nodes labelled 1 and 120 
labelled —1. 

In order to understand how gender correlates with friendship, we fit Ising models on the 
two networks. The MPLE for /3 corresponding to the two networks are given in the table 
in Figure 4. This can be used to test the null hypothesis that gender does not correlate 
with friendship. The p-values show that the null hypothesis is rejected at the 5% level 
in both cases, suggesting, as expected, significant correlation in the friendship-network 
based on gender. The MPLE in FBI is larger, which suggests a stronger gender-based 
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correlation in FBI, which might be due to the larger male-to-female ratio in FBI than 
in FB2. 



FBI 

FB2 

{Vertices, Edges) 

(221, 3176) 

(333, 2519) 

Average Degree 

28.74 

15.13 

MPLE 

1.0518 

0.8530 

p-value 

0.0045 

0.0001 


Figure 4: Facebook friendship-network: The table gives the MPLE of 0 for Ising models on two Face- 
book friendship-networks and the corresponding p-values for testing independence. The plot shows the 
empirical (resampled) error-bars for the MPLE in the two networks. 

Figure 4 also shows the error bars for the MPLE calculated using parametric bootstrap: 
10^ realizations of the Ising model were resampled using the original MPLE, which then 
gives an estimate of the standard error of the MPLE. Note that the error bar for EBl is 
slightly longer than that for EB2. This might be because the FBI network, with average 
degree 28.74, is significantly dense than FB2, which has average degree 15.13. 


5. Proof of consistency of the MPLE 

This section contains the proof of Theorem 2.1. The technical lemmas required for the 
proof are listed in Section 5.1 and proved later in Appendix A.l. Using this, we complete 
the proof of the theorem in Section 5.2. Corollary 2.2 is proved in Section 5.3. 


5.1. Technical lemmas 

The proof of Theorem 2.1 requires a few technical lemmas. We begin by showing that in 
Ising models satisfying (2.6), the asymptotic order of the sufficient statistic Htq{a) is the 
same as the order of the log-partition function, that is, (a) the sequence does 

not tend to 0 in distribution, and (b) ^7lAr(cr) is Op(l). In fact, (b) is not required 
in the rest of the proof, however we include it because, together with (a), it gives the 
correct order of which appears to be new and might be of independent interest. 

The proof of the lemma is given in Appendix A.l. 

Lemma 5.1. Under assumption (2.6), the following hold: 

(a) lime_>olimsupjv_j.oolP/3o(-f^Af('^) < 
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(b) lim/f_>oolinisup^_^<^P/3o(i^Ar(o-) > Kun) = 0. 

The next lemma is similar to the lemma in [10], Lemma 1.2, where it was shown that 
the second moment of the function L„{Pq) is 0(l/iV) whenever the log-partition function 
scales like N. Here, by a finer analysis using part (a) of Lemma 5.1 we show that the 
E^j,(Lct(/3o)^) = 0{aN/N^), if the log-partition function has order oat. The proof of the 
lemma is given in Appendix A.2. 


Lemma 5.2. Let be as defined in (2.2). Then under the assumptions in Theo¬ 
rem 2.1, for N large enough, 

N'^ 

limsup— EiSg{La{Po)'^) <oo. 

Af-s-oo Oat 


Lemma 5.3. Under the assumptions in Theorem 2.1, 

lim lim lim Psnl Wi(cr)^l||mi(cr)| < K] < eoat | = 0. 

e-fO if-s-oo Af-j-oo ™ I V ' LI V U J ; 


The above lemma replaces the application of Paley-Zygmund inequality of [10], Lemma 2.2, 
and will be used to complete the proof of Theorem 2.1. 


5.2. Completing the proof of Theorem 2.1 


By Chebyshev’s inequality and Lemma 5.2 there exists C < oo such that 

_ /y2 (j 

P;3„(|L,(/3o)| > K,^/N) < —E;3„L,(/3o)" < (5.1) 

Oat Aj 

Now, fix i5 > 0. Therefore, it is possible to choose Ki = Ki{S) such that the RHS above 
is less than S. 

Also, by Lemma 5.3 there exists e := s{6) > 0 and K 2 = K 2 {e, 5) < 00 such that 


N 


■ho 


'^mi{aY\[\mi{a)\ < K 2 ] > eon > 1 - 


(5.2) 


for N large enough. Thus, taking N large enough and setting 


TnWo) := |cr e Sat : |Lcr(,do)| < ATi Wi(g)^l{ |TOi(g) | < K2} > eoArj, 


we have ^^^{Tn) >1 — 5. For a G T^, 

Km\ = 




= y^m^(cr)^sech^(/3oTOi(cr)) 
h=ho i=i 


N 
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1 

> — sech^(^ 0 ^ 2 )^ mi(cr)^l{ Imj(cr) I < K 2 } 

> e^sech^{f3oK2). 

Therefore, 

> \LM\ = \LAPo)-L4^N{<y))\ 

/■/3oV/3jv(o-) 

> / . Ki^)d/3 

J /3oA/3jv(ct) 

- ^r^|tanh(X2/3Af(o-)) - tanh(ii:2/?o)|- 

A 2 -/V 

Let R — R{5) := This implies that 

P/3o(i/a7v|tanh(i4r2,5Ar) - tanh(i4r2,5o)| > R) < 5, 


and Theorem 2.1 follows. 


(5.3) 


(5.4) 


5.3. Proof of Corollary 2.2 

Note that for t £ Sn and any K > 0, 

^|m,(T)|l{|m,(T)| > i^} < ■ 

Therefore, condition (a) in Theorem 2.1 holds with a_/v = N- 
Moreover, 


N N N 

E = E E 

i,j — l i—1 i—1 

that is condition (b) in Theorem 2.1 holds with qn = N- 

Finally, to check (2.6) note that F^{I3) = < ^N, where M := || JwH < oo. 

Therefore, 

lim liminf — FN(/3 o — i5) > lim liminf f — FAr(/3o)-> Oj 

5—j-0 N—^oo N (5—j-0 N—^oc ])l 2 J 

by condition (2.5). Also, lim5_,.olimsupjv_>.oo ;yCv(/3o + <5) < Mlims^oi/^o + ^) < oo. 
This verifies (2.6) and by an application of Theorem 2.1 the result follows. 
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In this section, we give the proof of Theorem 2.3, which shows that consistent testing and 
estimation is impossible whenever the partition function is 0(1)- This is a consequence 
of a general result (see Proposition 6.1 below) which shows that distinguishing two prob¬ 
ability measures Pat versus Qn is impossible whenever the KL divergence between the 
two measures Pat and Qat remains asymptotically bounded. 


6.1. Non-existence of consistent tests 


For every N > 1, let be a measure space and Pat and Qat two distributions 

on this measure space. Let /tat be a dominating measure for both Pat and Qat, and pat 
and qn denote the respective densities with respect to this measure. Also, denote the 
Kullback-Leibler (KL) divergence between Qat and Pat by 


-D(QAf||IPAf) 


Eq„Ljv(A') := Eq„ log 
f gjv(a:)log dp at- 


( 6 . 1 ) 


Consider the problem of testing Pat versus Qat. A sequence of tests (/)Ar is consistent 
for this testing problem if there exists a sequence of test functions {(j)N}N>i such that 
limAT-foo Epjv'/'Af = Oj limAr->.oo Eqjv = 1- 


Proposition 6.1. Consider the problem of testing Pat versus Qat. If 

limsupZl(QAr||PAr) < oo, (6.2) 

AT—>-oo 

then there does not exist a consistent sequence of tests for this testing problem. 


The proof of the proposition is given in Appendix B. In the following, we use it to 
prove Theorem 2.3. 


6.2. Completing the proof of Theorem 2.3 

Given Proposition 6.1, it remains to verify that 

= Fm^ - Ajv(/3i) - < oo, (6.3) 

for 0 < /3i < /32 < /3o (where /3o satisfies (2.8)). 

By hypothesis (2.8) there exists M < oo such that FAr(/3i) < M and Fn{P 2 ) < M, 
for N large enough. Moreover, by the monotonicity of F'j^{-), 

rh 

{h - /3i)F'M) < / F'j^{0) de = FM{h) - F^m < M, 

J Pi 


proving (6.3). 
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7 . Applications: Proofs of Corollary 2.4, 3.1 and 3.2 

In this section, we prove Corollary 2.4 which will then be used to derive rates of consis¬ 
tency of the MPLE for Ising models on different graph ensembles, using Theorems 2.1 
and 2.3. To apply these results, we need to determine the correct order of TAr(/3o) in a 
neighborhood of a point fdo > 0. However, the exact asymptotics Fn{/3q) is known only 
for specific choices of the matrix Jjv and for specific values of (3o. 

Nevertheless, the correct order of Ff^{l3o) can be easily obtained in various examples, 
using, for instance, the following very useful lemma, which is of independent interest and 
may find other applications. 

Lemma 7.1. Consider the family of probability distributions on Sn given by (1-2). 
Assume that the elements of the matrix Jn are non-negative, and Xi{Jn) < X 2 {Jn) < 
• • • < Xn^Jn) are the eigenvalues of the matrix Jn- 

(a) For0<l3< 

1 ” 

FN{ld)<--J2log{l-ldKiJN)). (7.1) 

(b) For any /3 > 0, 

^V(/3) > X! logcosh(/3JAr(z, j))- (7.2) 

Proof. Let W := (Wi,W 2 , ..., Wn)' be a vector of i.i.d. N{0, 1) random variables. Note 
that for any s > 1 and non-negative integers 6 i, 62 , ■ • •, &s 

■ ■ ■ IT^. 

Since the matrix Jn has non-negative entries, by expanding the exponential function in 
power series every term can be bounded using the above inequality. This implies that 

gF„(/3) ^ ( 7 . 3 ) 

The RHS of (7.3) can be computed exactly as follows: Let Jn = be 

the spectral decomposition of Jat, where pi,p 2 , ■ ■ ■ ,Pn are the normalized eigenvectors 
of Jn- Then setting p'lT = for 1 < z < we get 

Note that Z := {Zi, Z 2 , - - -, Fn) is a vector of i.i.d. 1V(0,1) random variables. Therefore, 
(7.3) and (7.4) implies 


N 

Ee^f^w'j^w < _ px,{Jn))~^\ 

2=1 
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using the MGF of the chi-squared distribution (since l3Xi{JN) < 1, for all 1 < i < N). 
The inequality (7.1) follows by taking log on both sides. 

To prove (b), let {Yij, 1 < i < j < N} be i.i.d. with P(lij = ±1) = Then for any 
collection of non-negative integers 

E n n 

1 <2<_7<A^ 1 <z<j<A^ 

Indeed, this follows on noting that both the LHS and RHS are {0, l}-valued, and the 
LHS is 1 if and only if bij is even for all which is when the RHS is 1 as well. This 
implies, 

g^’iv(/3) = Eq g/3-tN(io')CTicr„- ^ g JJ" g/3Jiv(j,i)Tj 

l<i<j<A^ 1<2<_7<A^ 

= cosh(/3JAr(i, j)). 

The inequality (7.2) follows on taking log on both sides. □ 

Remark 7.1. Note that the upper bound (7.1) is obtained by replacing the spin config¬ 
uration a = (cti, tT 2 , • ■ •, ctat) with a vector of i.i.d. N{0, 1) random variables. To get the 
lower bound, the collection {(Jiaj}i<i^j<]\[ is replaced by i.i.d. Rademacher random vari¬ 
ables. Surprisingly, the bounds obtained by these simple comparison techniques often give 
the correct asymptotic order of in the high temperature regime f To get 

the order of beyond the phase transition, the standard mean-field approximation 

can be used (see Section 7.1 for details). 


7.1. Proof of Corollary 2.4 

For all /3 > 0, by the bound (7.2) in Lemma 7.1, we get 

FNifi) > logcosh(^Jjv(i, j)) > Ci/3'^ (7.5) 

where C\ := inf| 2 ,|<i > Q. To get the upper bound, we use (7.1) for fi < \ 

1 ^ /-I o2 ^ 

FNifi) < ^ 


< 


2 

C 2 I 3 

2 

C 2 / 3 ' 


i=l 

2 


tr(j: 


(7.6) 


E JN{i,jf 

ij'=l 


2 
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where C 2 = 02 ( 13 ) := sup|^|<^ - iog{i-x)-x ^ ^ ^ < 1 ^ t^q ^gg f^gt 

that J2f=i ^ii’^N) = 0. The bounds (7.5) and (7.6) together implies (2.6) with qn = 
for /3 = /3o < X- Therefore, if 00 , part (a) follows by 

Theorem 2.1. 

Finally, if limsupjv_>.oo J2f,j=i < 00 , then Fat)/?) = 0(1) for /3 < ^ and by 

Theorem 2.3 part (b) follows. 

7.2. Proof of Example 1 

It is well known that in the Curie-Weiss model for /3 > 1 (see [8], Example 3.9) 

lim := F(/3) e (0, 00 ). (7.7) 

N^oo N 

Note that 

a'JNO- = — ^ <7iaj + -j= ^ CTjCTj 

^ f<i/j<f+\/]v 

where is a N/2 x N/2 matrix with Aiq{i^j) = 1/N, for i ^ j, and B^ is a y/N x y/N 
matrix with Bjsi{i,j) = I/'/N, for i ^ j, and cr = (cr(i),cr(2))^ Therefore, 

gF«(/3) ^ EggffoJiv<T ^ Eoe2. (7.8) 

Note that JatH = 1 and by (7.1) F]s[(/3) = 0(1) for /3 < 1. Thus, there exists no 
sequence of consistent estimators for j3 S (0,1) by Theorem 2.3. 

For 1 < /3 < 2, by (7.7) 

0 < lim inf —^Fn(I3) < lim sup —^Fn{(3) < 00 , 

N^oo i/_/V N^oo y/N 

since is the Hamiltonian of a Curie-Weiss model on size y/N. Moreover, 

|TOi(F)| < 1, for all 1 < i < iV and r S Sn; so taking K = I, l?^i(o')|l{|wi(o')l > 

K} = 0, establishing condition (a) of Theorem 2.1. Therefore, the MPLE {/3Ar}Ar>i is 
A^^/^-consistent for P € (1,2) by Theorem 2.1. 

Similarly, for /3 > 2 

0 < lim inf -^Fjyfp) < limsup ^F/v(/3) < 00 , 

AT-s-oo N N->-oo N 

and so the MPLE {Pn}n>i is -s/iV-consistent. 
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Note that when the sufficient statistic is of the form (3.1), \mi{T)\ < 1, for all r G Sn- 
Therefore, taking K = 1, ^ X)i=i l"ii(o’)|l{|TOi(o')| > K} = 0, which implies condition 
(a) of Theorem 2.1. Moreover, in this case, ||JAr|| = 1, and = N/d^- 

Therefore, part (a) follows by Corollary 2.4. 

By Corollary 2.2, to show part (b) it suffices to verify that condition (2.5) holds for 
all Pq > 1. This is done using the mean field approximation of Lemma C.l. By plugging 
in the vector (m, m,..., m)' for the vector z in the RHS of (C.l) 

Fn{Po) >N sup I (7.9) 

where I{x) := |(1 + x) log(l + x) + ^(1 — x) log(l — x) for x £ [—1,1]. Thus, it suffices 

to show that sup^g[_i g(jn) > 0, where g{m) := - I{m). To this end, note that 

g"(0) = /3o — 1 > 0, that is, TO = 0 is not a local maximum of g. This implies the RHS of 
(7.9) is positive, thus verifying condition (2.5). 


7.4. Proof of Corollary 3.2 


Let di be the degree of the vertex i in G^, for 1 < i < N. Then |TOi(r)| < for all 

r £ Sn- In the regime ^ piN) < 1, the maximum degree A = maxi^viCN) — 
Np{N){l+o{l)) with high probability [25]. Therefore, |TOi(T)| < l+o(l) for all 1 < i < IV, 
and by taking K > 2 it follows that piN)J2^i \'yni{cF)\\{\mi{a)\ > K} = 0, with high 
probability. This implies condition (a) of Theorem 2.1. 

Moreover, for ^ p{.N) < 1, \\Jn\\ = 1 + o(l) with high probability [25], and 

N 2 

p(N) 5 1 , 


and part (a) follows from Corollary 2.4. 

To prove part (b), we use the mean field approximation as in Corollary 3.1. By plugging 
in the vector (to, to, ..., m)' for the vector z in the RHS of (C.l), we get 


Fn{Pq) > N sup 

mG[—1,1] 


PonP\E{GN)\ 

N^p{N) 


I{m) 


Condition (2.5) follows by arguments similar to those in Corollary 3.1 and the fact 

2|ig(Giv)l ^ 1 
N^piN) 
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8. Proofs of Theorems 3.3 and 3.4 

In this section, we show the existence of a untestable/testable threshold in Ising models 
on converging sequence of dense graphs, and compute the distribution and asymptotic 
power of the most powerful test, before the phase transition. 

8.1. Proof of Theorem 3.3 

If Gn converges to W, then ^m(G'Ar)|| converges to the operator norm of ||ir|| (see 
(3.4)). Moreover, 

1 ^ 

— J2x.{A{GNr)^t{C2,W), 
and part (a) follows by Corollary 2.4. 

We now show (b). From [ 8 ], Theoem 2.14, when Gn converges to W, then limjv->.oo jfFN{/d) 
(S’(W, /3), where 


S’(W,/3) := sup 


2 



m(x)m{y)W{x, y) da; dy 



( 8 . 1 ) 


and I{x) = 5 {(l + x) log(l + x) + (1 — x) log(l — x)} as in Corollary 3.1. By Corollary 2.2, 
it enough to show that £{]¥, /3) > 0, for (3 > p|?|| ■ 

To this end, let ni(x) to be the eigenvector corresponding to the eigenvalue A = ||IF||. 
Then |Awi(x)| = | W{x,y)vi{y) dy\ < 1, and sup,j.g[Q |wi(x)| < oo. Thus, there exists 
d > 0 such that for z £ {—5,5) we have sup^-giQ j^] |zz;i(x)| < 1, and 


S{W,f3)> sup I f vi{x)vi{y)W{x,y)dxdy — f /(zni(x))dx 
|z|<5l2 i[ 0 . 1]2 Jo 

= sup I ^ f ui(x)^dx— [ /(zz;i(x)) dxl. 

U|<5l 2 Jq Jq j 


Setting h(z) := vi(x)^ dx — I(zvi(x)) dx it suffices to show that z = 0 

is not a point of local maxima of the function h. This follows on noting that h''{0) = 
(/3A — 1) /p vi (x)^ dx > 0. 


8.2. Proof of Theorem 3.4 

By Lemma D.l (see Appendix D), the limiting distribution (3.6) is well defined. 

The following proposition (proved in Appendix D) gives the limit of the log-partition 
function, for a converging sequence of dense graphs, for /3 < 
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Proposition 8.1. Let {G'Ar}Ar>i be a sequence of simple graphs converging in cut- 
metric toW& W, such that dxdy > 0. Then for any 0 < /3 < plqy 

- OO 

= --^{log(l-/3A,(lP)) -/3A,(1P)}. (8.2) 

°° ^ i=l 


The above proposition can be used to complete the proof of Theorem 3.3 as follows: 
Fix (5 > 0 such that /3 + (5 < ||t^- Then for any t G (5), 

E/3exp{^ • ^cr'VFATCr} = exp{FAr(/3 + t) -FAr(^)} 


-> 


n 




e 2 


(8.3) 


1 - 


i-y\i(w) 


by Proposition 8.1. 

By Lemma D.l the RHS above is the MGF of the random variable defined in 

(3.6). 

Appendix A: Proofs of technical lemmas 

In the appendix we prove the lemmas used in the proof of Theorem 2.1. The rest of 
the section is organized as follows: Appendix A.l contains the proof of Lemma 5.1. The 
proofs of Lemmas 5.2 and 5.3 are given in Appendices A.2 and A.3, respectively. 


A.l. Proof of Lemma 5.1 

By (2.6) there exists S G (0,/3o/2) such that liminfAr_).oo — S) >0. By the 

monotonicity of F^{-), 

rho—S 

Fn{Po -S)= F'j^{t) dt < {Po - S)F'ff{Po -S) < PqF'^^Pq - < 5 ), 

Jo 

it follows that liminfAr_>oo ^F'j^{Po — 6) > 0. Thus, for any £ > 0 

P^„(ilAr(cr) < eon ) = 
which, on taking logarithms, implies that 

logP;3„ (iljv(a) < eon) < ^ - r F^{t) dt < ^ - F'M - S)S. 

^ Jho-S ^ 
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Dividing both sides by ajv and taking limits as N ^ oo followed by e —>■ 0 we have 
lim limsup — logP/ 3 „ {HN{a) < eoAr) < — liminf — F^(/?o — (5) < 0, 

£—>■0 TV—>-oo N^oo ajv 

thus completing the proof of (a). 

To show (b), again invoking (2.6) there exists 5 > 0 such that limsup^_^oo T)v(/3o 
23) < oo. Since 

p0Q-\-26 

Fn{(3o + 25) = / FnI^) + 5), 

Jo 

it follows that limsupjv_).oo '^F’f^iPo + 5) < oo. Thus, for any K < oo 


P(iJAr(tT) > Kgn) = P(e2 




> 


^ ^—^SKaN-\-FN(^o-\-S) — FN{/3Q) 


Taking logarithm on both sides, 


logP(iJAr(cr) > Kun) < + 


n/3o+i5 


'/3o 


F'^{t) dt 


<-^-^ + F'M + s), 

from which dividing by ajv and taking limits as iV —>■ oo followed by —>■ oo gives 

lim limsup T iogP(iJ^(cr) > Kqn) = —oo, 

K^oo o,N 


thus proving part (b). 


A.2. Proof of Lemma 5.2 


We begin with a technical estimate which will be needed to bound the second moment 
of L^{l3o). 


Lemma A.l. Under assumption (2.6) and mi{a) as defined in (2.3), 

1 ^ 

limsup—> mi((T) tanh(/3oTOi(o')) < oo. 

Af->oo otv —( 


Proof. By (2.6) there exists 5 > 0 such that hmsup^_^oo ^Ov(/?o+5) < oo. Therefore, 
Fn{Po + S) = F^(t) dt > SF'M), and so 


— limsupZ)y(/3o) < oo. 

O-N N^oo 


(A.1) 
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Now, observe that mi{u) does not depend on Ci, andE^(,(CTi|((Tj)j^i) = tanh(/3oTOi(tT)). 
Since 


N 


2^Ar(^o) = ^^PoHnW) = E^J ^ 

\i=i y 

/ N \ 

= rni{a) tanh(^om,(cr)) 


the result follows from (A.l). 


□ 


The above lemma will be used to complete the proof of Lemma 5.2. To this end, for 
1 < j < -N and r e Sn, let 

■= (Tl,...,T,_l,-Tj-,T,-+l,...,TAr) 


and 


Po{t) = 


^-PaTjmj(T) 


g/3oTjm„ (T) _|_ g-/3oTjmj(r) ' 

From equation (10) of Chatterjee [10] it follows that 

1 


i=i 


Setting r{x) := a;tanh(/3oa;), note that 


(A.2) 


E/3(Act(/3o)^) = ^E/3X!(^<t(/3o) - (A.3) 


2mj{a)aj , 1 


N 


L^iPo)- L^u){M = -r{mi{a))}. 

Now, by a second order Taylor expansion, 

E^(L.(/3o)^) = ^(Ti+T 2 + r3), 


(A.4) 


where 


N 


Ti = — y]]TOj(cr) 2 p^(cr), 

i=l 


T 2 = - 


2 

ttAT 


N 

JN{i,jy{mi{a))mj{a)pj{a) 

.i=i 


and 

2 ^ 

Tg = — r''{0ij{a))JN(i,jfmj{a)ajPj{a), 

• - T 

for some 0ij{a) in the interval mi{a)]. Therefore, to prove the lemma, it suffices 

to control these three terms. 
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To control Ti, note that 

2 1 

+ e-Pom,ia) = ^ (a)), 

and sech^(/ 3 oa;) < Mix tanh^fdox) for all a; S K for some Mi = Mi(/?o) < oo, which 
gives 


< 


E/3o7i = —E^o ^ mj (af sech^ {Pomj (a)) 

a=i 

^E^oE nij {a) tanh(/3om.j- (cr)), 


N 


(A. 5 ) 


Qn 


a=i 


which is finite as iV —> oo by an application of Lemma A.l. 
Now, let us bound T2. By the Cauchy-Schwarz inequality, 


N 


1/2 


N / N 


2 '. 1/2 


\^2\ < j)mjia)ajPj{a) 


L / t i=l \j=i 

2\\M\ 

< j jE’Tljicr) Pj(ct) 


1/2 


Taking expectation on both sides above and using Cauchy-Schwarz inequality again 

1/2 

> . (A.6) 


E 


Ao 


1^21 < {mi{a)Y ■Ei3g'^mj{afpj{a) 

Qn I 


i=l j=l 

2 iq „\ i 2 


Now, since r'{x)‘^ = {tanh(^oa^) + / 3 oa^ sech^(/ 3 oa;)}^ < M2a; tanh(/ 3 oa:), for some constant 
M2 = M2{Po), by Lemma A.l 


limsup — E^g r' {mi{a)Y < 00. 

N^oo aN 

Using this along with (A. 5 ) in (A.6) gives limsup^_^go E/30IT2I < 00. 

It remains to bound T^. Since M3 = M^{j 3 ii) '■= sup^.^^ |?'"(ai)| < 00, we have 


N 


2M3 


N 


IJ3I < E JN{i,3f\'mj{a)\pj{a) 

a-N . , 

*.a=i 

' N / N \ 2 ^ 1/2 


< 


2M3 

an 


N 


1/2 


'^m,{afpj{a) 

a=i 


(A. 7 ) 
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< 


2M3||Jjv|| 

a-N 


N 

jN{i,jf 


1/2 /- N 

li=i 


1/2 


where the last step uses JNii,j)^ = ||>^Afej|P < ||^atIP- Finally, taking expectations 
on both sides in (A.7), and using condition (b) on the first term, and (A.5) on the second 
term, gives limsup^.,.^^ E/3o I'^sl < oo. 


A.3. Proof of Lemma 5.3 

Fixing (5 > 0 by Lemma 5.1(a) there exists e = e((5) > 0 such that 

]P/3o < 3e/3oaAr) < S, (A.8) 

for N large enough. Also, using Lemma 5.2 and Chebyshev’s inequality, for ATi = 
All (5) := we have 


^/3o(|Act(/3o)| > Kl^/aN/N) < —E^qLct(/3o)^ < 

On 


(A.9) 


Moreover, by condition (a) in Theorem 2.1 there exists K 2 = K2{5) < 00 such that 
for all N large enough we have 


N 


%oEl mj{a)\l{\mj{a)\ > K 2 } < eSfloaN 


and so by Markov’s inequality 


> -^ 2 } > ePoaN^ 


^ E/3o S^yl \mjia)\l{\mj{a)\ > K2} ^ ^ 
~ SpoO-N ~ 


(A.IO) 


Defining 


AAr((5) < cr e S'jy : H]si{a) > 3e/3oaAr, |Lct(/3o)| < Ki 


y/ON 

N ’ 


N 


{|TOj(cr)| > K 2 } > e/doaN >, 


we have P/3 q(Ajv(5 )) > 1 — 3(5, for N large enough (by combining (A.8), (A.9), and 
(A.IO)). 
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Now, on the set A]^(S) using the bounds tanhx < x on a; < and tanhx < 1 on 
X > K2, 


N N 

< K 2 } + e/3oaN > mi{a) tanh(/3om^(cr)) 

i=l 1=1 

= HN{<j)-NL^i/3o) 

> SePoaN - Ki^/on. 

Thus, on the set An{S), 


N 

< K 2 } > 2eaN 


—^y/dN > EdM 


for all N large, completing the proof. 


Appendix B: Proof of Lemma 6.1 

For every > 1 , let be a measure space and Pjv and Qat two distributions 

on this measure space. Recall the definition of Kullback-Leibler divergence Z?(QAr|jPjv) 
from (6.1), and consider the problem of testing Pjv versus Qat such that condition (6.2) 
holds. Since Il(QAr||PAf) = '^QnLn, by assumption ( 6 . 2 ) 

0 < — Eq„L)^ < Ml, (B.l) 

for some Mi < oo and all large N. Also, there exists M2 < 00 such that Eq„L)^ < M2, 
for all N. To see this, note that 

00 

^Qiv^iv = ~ ^IEQ^Livl{—s < Ln < —S + 1} 

00 

< ^se-("-i)Piv(-s < Tiv <-s + 1 ) (B. 2 ) 

00 

< ^ := M2 < 00. 

S = 1 

Hence, by (B.l) and (B. 2 ), Eq„|L7v| = +Eq„L)^ < Mi + 2M2 =: M < 00. 

Therefore, by Markov’s inequality, for any e > 0 

Qiv(|Tiv| > M/e) < ^Eq„(|Liv|) < £. 

Now, suppose there exists a sequence of test functions ^at such that Ep^^t/fAr —>■ 0 . Then 

^iQn 4 ’n < Qiv(|AAr| > M/e) + Eqj^((/A rljlTivI < M/e}) < e + 

Taking limits on both sides gives, lim supAr_>.oo ®Qn < £■ Since e > 0 is arbitrary 
limAr->.oo Eqj^Pn = 0, that is, 4 >n is not a consistent sequence of test functions. 
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A standard technique to derive a lower bound on the log-partition function is the mean- 
field approximation (refer to [13] for details). Here, we give a short proof for the sake of 
completeness. 

Lemma C.l. Consider the family of probability distributions on Sn given by (1-2). 
Then for any matrix 


FN{/d)> sup 

zG[-J .i]" 


2 


z' JpfZ 



where I{x) = ^[(1 -I- x) log(l -I- a;) -I- (1 — x) log(l — x)] for x G [—1,1]. 


(C.l) 


Proof. Let L?(-l| •) be the Kullback-Leibler divergence between two probability measures. 
By a direction computation, for any probability mass function i/ on Sff = [—1,1]'^ we 
have 

D^iyWFp) = F]sr{f3) + A^log2 -h Ej,logi/(cr) - ^Ei,HN{cr). 

Now, since D{i/\\¥p) > 0 we have 

Fn{P) > ^E^HN{a) - Ej, log j^(ct) - N log 2. 

One can obtain a lower bound on F]^{fi) by taking supremum in LHS over product 
measures, that is J^(cr) = H^i Hence, setting Zi = Ej^.cr = — G [—1,1], 

the bound in (C.l) follows. □ 


Appendix D: Proof of Proposition 8.1 

We begin by deriving the MGF of the limiting distribution (3.6). The proof involves 
straightforward calculations using the MGF of the chi-squared distribution, similar to 
[6], Proposition 7.1. 

Lemma D.l. Let {ai\i>i,{bi\i>i be a sequence of real numbers such that < oo 

and = T fox some finite real number fi. Suppose ^ 1 ,^ 2 , ■■ ■ be i.i.d. Xi random 

variables. 

(a) Then the sum S := ^ ~ ^i) converges almost surely and in . 

(b) Moroever, if M := supj>]^ joij < 00 , then for 0 < t < 

00 — iihi 

1 VI - to. 


(D.l) 
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Proof. By defining Sn ■= ~ ^i) ■= it follows that 

(Snt^n) is a martingale, with 

lim sup ESjf = - ^ Oj J < oo, 

and so Sn converges almost surely and in [17]. 

To compute the moment generating function of S, first note that e*'^" ^ e*"®. Thus if 
the collection of random variables {e*"®"} is uniformly integrable, then we have 


N 


Ee*'® = lim Ee*'^" = lim TT 

N —>-00 ^ —Vf-v-i 


htbi 


N-^oo■^^ — tai J\ — tai ’ 

2 = 1 2 = 1 


n 


htbi 


thus completing the proof of the lemma. It thus remains to prove uniform integrability, 
for which it suffices to show that for some i5 > 0 we have limsup^_^oo < oo. 

Since t < ^ there exists <5 > 0 such that t + 6 < For this 5 setting t' :=t +5 we have 


/ 1 

logEe* = -^{-t\-log(l-t'a,)}. 
^ 2=1 


(D.2) 


Now setting C := sup| 2 ,|<f/j\^ - iog(i-x)-x ^ ^ — log(l — x) — x < Cx"^ for 

jccj < t'M, and so the RHS of (D.2) can be bounded by ^ }, which 

converges to e* Sli=i “i. Therefore, is uniformly integrable, thus completing 

the proof of the lemma. □ 


The above lemma can be used to complete the proof of Proposition 8.1. To this end, 
let Wn '■= A{Gn)- Then, by [6], Theorem 1.4, it follows that 

OO 

2=1 

where ^ 1 )^ 2 , • ■ •, are i.i.d. Xi random variables. Thus, 

exp{^ • ^cr'I^ATcrj % expj ^ ^ A,(IT)(Ci - 1)|. (D.3) 

If the LHS in (D.3) is uniformly integrable, then 

^li^Eexpj^ • ^cr'tFVcrj = Eexpi ^^Aj(IF)(Cz - 1)1 


= n 


k 2=1 

~^/T^wwy 


(D.4) 
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where the last equality uses Lemma D.l. The proof of part (a) then follows on taking log 
of both sides of the above equality. 

It remains to show that the LHS in (D.3) is uniformly integrable, that is, 

limsuplogEoexpi ^ ^ ^ ■ ^er'lTjval = lim F/v(/3 + <5) < oo, (D.5) 

N^oc, [ 2 N j N^oo 

for some <5 > 0. To this end, note that if 0 < f3 < l/||kL||, there exists 6 > 0 such that 
"f := P + 6 < l/|jlT|j. Now, using (7.3) and the fact = 0, we have 




7 

2 ' N )' 


(D.6) 


Since Wn => W in the cut metric, limw->.oo = 7||kL|| < 1, and so there exists 

e > 0 such that for all N large enough Por x < 1 — e there exists 

M = M{e) such that — log(l — x) — x < Mx^. Using this the RHS of (D.6) can be 

bounded by which converges to M'y^WWWj = as iV ^ oo. This 

proves (D.5) and completes the proof of the proposition. 
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